Learning with Unlabeled Data
نویسندگان
چکیده
of thesis entitled: Learning with Unlabeled Data Submitted by XU, Zenglin for the degree of Doctor of Philosophy at The Chinese University of Hong Kong in January 2009 We consider the problem of learning from both labeled and unlabeled data through the analysis on the quality of the unlabeled data. Usually, learning from both labeled and unlabeled data is regarded as semi-supervised learning, where the unlabeled data and the labeled data are assumed to be generated from the same distribution. When this assumption is not satisfied, new learning paradigms are needed in order to effectively explore the information underneath the unlabeled data. This thesis consists of two parts: the first part analyzes the fundamental assumptions of semi-supervised learning and proposes a few efficient semi-supervised learning models; the second part discusses three learning frameworks in order to deal with the case that unlabeled data do not satisfy the conditions of semisupervised learning. In the first part, we deal with the unlabeled data that are in good quality and follow the conditions of semi-supervised learning. Firstly, we present a novel method for Transductive Support Vector Machine (TSVM) by relaxing the unknown labels to the continuous variables and reducing the non-convex optimization problem to a convex semi-definite programming problem. In contrast to the previous relaxation method which involves O(n2) free parameters in the semi-definite matrix, our method takes
منابع مشابه
Consistency of Lipschitz learning with infinite unlabeled data and finite labeled data
We prove that Lipschitz learning on graphs is consistent with the absolutely minimal Lipschitz extension problem in the limit of infinite unlabeled data and finite labeled data. In particular, we show that the continuum limit is independent of the distribution of the unlabeled data, which suggests the algorithm is fully supervised (and not semisupervised) in this setting. We also present some n...
متن کاملSemi-Supervised Learning of Gaussian Classifiers
In this paper we present an approach that trains Gaussian classifiers using labeled and unlabeled data. Training with unlabeled data introduces efficiency in terms of time and energy spent for labeling the data. We present experiments on different data sets to illustrate the effect of unlabeled data on the performance of the classifiers. We will try to show that under specific conditions unlabe...
متن کاملLearning from partially labeled data
The Problem: Learning from data with both labeled training points (x,y pairs) and unlabeled training points (x alone). For the labeled points, supervised learning techniques apply, but they cannot take advantage of the unlabeled points. On the other hand, unsupervised techniques can model the unlabeled data distribution, but do not exploit the labels. Thus, this task falls between traditional s...
متن کاملLearning From Labeled And Unlabeled Data: An Empirical Study Across Techniques And Domains
There has been increased interest in devising learning techniques that combine unlabeled data with labeled data – i.e. semi-supervised learning. However, to the best of our knowledge, no study has been performed across various techniques and different types and amounts of labeled and unlabeled data. Moreover, most of the published work on semi-supervised learning techniques assumes that the lab...
متن کاملEstimate Unlabeled-Data-Distribution for Semi-supervised PU Learning
Traditional supervised classifiers use only labeled data (features/label pairs) as the training set, while the unlabeled data is used as the testing set. In practice, it is often the case that the labeled data is hard to obtain and the unlabeled data contains the instances that belong to the predefined class beyond the labeled data categories. This problem has been widely studied in recent year...
متن کامل